-
Notifications
You must be signed in to change notification settings - Fork 727
Documentation for query execution concepts. #22537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
⚪ Test history | Ya make output | Test bloat
🟢 |
|
⚪ Test history | Ya make output | Test bloat
🟢 |
|
🔴 Invalid Changelog category: Documentation(changelog entry is not required) |
✅ Documentation buildRevision built successfully Build logsWarnings (1) |
|
⚪ Test history | Ya make output | Test bloat
🟢 |
|
⚪ Test history | Ya make output | Test bloat
🟢 |
✅ Documentation buildRevision built successfully |
|
Hey , it has been 38 business-hours since the author's last update, could you please review? |
|
Hey @lopatinevgeny, it has been 68 business-hours since the author's last update, could you please review? |
|
Hey @lopatinevgeny, it has been 98 business-hours since the author's last update, could you please review? |
|
Hey @lopatinevgeny, it has been 128 business-hours since the author's last update, could you please review? |
|
Hey @lopatinevgeny, it has been 152 business-hours since the author's last update, could you please review? |
|
Hey @lopatinevgeny, it has been 182 business-hours since the author's last update, could you please review? |
|
⚪ Test history | Ya make output | Test bloat
🟢 |
|
🔄 New commits pushed — @lopatinevgeny please take a look. |
|
⚪ Test history | Ya make output | Test bloat
🟢 |
✅ Documentation buildRevision built successfully Build logsWarnings (1) |
|
🔄 New commits pushed — @lopatinevgeny please take a look. |
|
🔄 New commits pushed — @lopatinevgeny please take a look. |
|
🔄 New commits pushed — @lopatinevgeny please take a look. |
|
⚪
🟢 |
|
⚪
🟢 |
✅ Documentation buildRevision built successfully |
|
Hey @lopatinevgeny, it has been 470 business-hours since the author's last update, could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Текст хороший. Но я бы учел комментарии.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
У нас везде должно быть DataShard и ColumnShard. А на картинке написано Datashard и Columnshard.
Могут ли Compute Tasks запускаться на нодах, которые
- Не являются entry point
- Не содержат CS, DS?
Мой ответ -- да. Но это не видно
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно картинку усложнить, я не против, но она все равно схематическая, не думаю что это добавит понимания.
| 3. **Parsing and Plan Cache Lookup** | ||
| On the server side, the {{ ydb-short-name }} node that receives your query first parses and analyzes it for correctness. Before planning execution, {{ ydb-short-name }} checks whether a physical execution plan for this query already exists in the query cache. If a cached plan is found, it can be reused to save time and resources. | ||
|
|
||
| 4. **Query Optimization and Plan Preparation** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не написано про CBO. Допустим это как бы входит в Query Optimization. Но на самом деле есть очень важный этап получения статистики для CBO, в том числе есть некоторый путь, по которому данные статистики становятся доступными оптимизатору. Этот путь не описан.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Про CBO отдельная статья, да.
Про статистику это требует отдельной статьи, ну либо можно попробовать кратко описать в CBO, т.к. он и есть потребитель.
| {{ ydb-short-name }} provides a unified query interface capable of efficiently handling diverse workloads — from high-throughput [Online Transaction Processing (OLTP)](https://en.wikipedia.org/wiki/Online_transaction_processing) to large-scale analytical [Online Analytical Processing (OLAP)](https://en.wikipedia.org/wiki/Online_analytical_processing) queries. With this approach, applications can run transactional and analytical queries transparently, without having to use different APIs for different workloads. | ||
|
|
||
| {{ ydb-short-name }} uses a distributed query execution engine designed for high scalability and efficiency in large, distributed environments. When you run a query, {{ ydb-short-name }} automatically breaks the work down across multiple nodes, taking advantage of data locality — processing data where it is stored whenever possible. This reduces unnecessary data movement across the network. Additionally, {{ ydb-short-name }} leverages advanced features like compute pushdown, where filters and computations are pushed closer to the data storage layer, further improving performance. These techniques enable {{ ydb-short-name }} to efficiently handle complex queries and large workloads across clusters of machines. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Я бы добавил еще один раздел про физику. Пункты, которые я бы хотел покрыть
- У нас используется gRPC, а значит по одному tcp соединению параллельно могут идти и идут несколько сессий, а значит несколько транзакций. Это достаточно сильно отличается от, например, Postgres, поэтому стоит покрыть
- Запрос внутри сервера порождает несколько акторов, но не порождает процесс ОС. Поэтому мы можем поддерживать большое количество inflight запросов, сильное большее, чем традиционные бд
- Я бы написал про балансировку. То есть клиентские сессии автоматически распределены по узлам кластера. Это опять же очень важное свойство, если оно описано где-то в другом месте, я бы сослался не него. Читатель может не понимать, что на уровне сессий и tcp-соединений есть балансировка
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Кроме балансировки сессий, мне кажется то что ты предлагаешь лежит где-то не в области query processing. Статья ведь про выполнение запросов, тонкости TPC и актор-системы тут вряд ли добавят читаемости, другой уровень абстракции.
Про балансировку упомянуто, можно немного расширить.
|
|
||
| Result sets in {{ ydb-short-name }} can be arbitrarily large. To efficiently handle large amounts of data, {{ ydb-short-name }} streams result sets back to the client in parts (chunks). This streaming approach lets clients begin processing the results right away without waiting for the entire result set to be transferred. As a result, applications can handle large datasets quickly and with minimal memory usage. | ||
|
|
||
| ## Limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Я бы добавил про отсутствие checkpoints. Из-за этого долгие запросы могут не добежать до конца в случае рестартов нод.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Про какие checkpoints речь? Под этим термином очень разные вещи могут пониматься, и тут странно писать по то что их нет.
Наверное лучше сформулировать и именно в терминах error handling, что наши текущие механизмы error handling не позволяют продолжить выполнение запросов при рестартах нод, поэтому длинные запросы могут страдать на рестартах?
|
🔄 New commits pushed — @lopatinevgeny please take a look. |
|
⚪
🟢 |
|
⚪
🟢 |
✅ Documentation buildRevision built successfully |
|
/backport |
Co-authored-by: Ivan Blinkov <[email protected]> Co-authored-by: orange13 <[email protected]> (cherry picked from commit 741b5df)
|
Successfully created backport PR for |
Co-authored-by: Ivan Blinkov <[email protected]> Co-authored-by: orange13 <[email protected]> (cherry picked from commit 741b5df)
|
Successfully created backport PR for |
Changelog entry
...
Changelog category
Description for reviewers
...